01 | Logistics & Motivation

Max Pellert (https://mpellert.at)

Deep Learning for the Social Sciences

https://mpellert.at

Professor for Social and Behavioural Data Science (interim, W2) at the University of Konstanz

Assistant Professor (Business School of the University of Mannheim)

I worked in industry at SONY Computer Science Laboratories in Rome, Italy

PhD from the Complexity Science Hub Vienna and the Medical University of Vienna in Computational Social Science

Studies in Psychology and History and Philosophy of Science

Msc in Cognitive Science and Bsc in Economics (both University of Vienna)

(Some) Research interests

  • Computational Social Science

  • Digital traces

  • Affective expression in text

  • Natural Language Processing

  • Collective emotions

  • Belief updating

  • Psychometrics of AI

Giordano de Marzo

PostDoc at the University of Konstanz

Junior Research Fellow of the Complexity Science Hub Vienna

Consultant of the International Labour Office

Phd in Physics at Sapienza University, Enrico Fermi Research Center and Sapienza School for Advanced Studies in Rome

MSc in Theoretical Physics

Giordano de Marzo

Research Interests

  • Complex Digital Systems

  • Social Networks

  • Recommendation Algorithms

  • Large Language Models

  • AI

Who are you?

👀

Course program

Date Topic Who?
9.4. Logistics & Motivation Max
16.4. Supervised Learning Max
23.4. Shallow Neural Nets Max
30.4. Perceptron and Multi Layer Perceptrons Giordano
7.5. Convolutional Neural Networks Giordano
14.5. Graph Neural Networks Giordano
21.5. NN for Time Series analysis Giordano

Date Topic Who?
28.5. No class
4.6. Generative Deep Learning 1 Giordano
11.6. NLP 1 Max
18.6. NLP 2 Max
25.6. Reinforcement Learning Giordano
2.7. Large Language Models Max
9.7. Generative Deep Learning 2 Giordano
16.7. Outlook Max

Logistics

Lectures on Tuesday 10.00 - 11.30 in C421

Exercises (practical sessions) are provided over the semester on Wednesday 13:30 - 15.00 in D430

Your tutor will be Andri Rutschmann, he will co-teach tutorials with us

Assignments will be released on Tuesday evening or the latest Wednesday before the tutorial

The deadline is before a tutorial a few weeks later

Assignment submissions through Github as in ICSS

4 assignments count 40% of the final grade of the course (10% each)

Project

Counts 60% of the grade

More information about the project will be delivered soon…

“Course Book”

Prince, S. J. D. (2023). Understanding deep learning. The MIT Press.

Available in print or for free as a PDF: https://udlbook.github.io/udlbook/

On the webpage you will find many additional materials

We will cover many topics from the book and it also helps you as additional materials to deepen your knowledge on specific aspects in-depth

In addition to the contents covered in the book, in this course we aim to keep the focus on applications in the social sciences

Supervised Learning

Basic workflow: Define a mapping from input to output

Learn this mapping from paired input/output data examples

Often, the examples are from data sets of inputs that have been manually annotated by humans, i.e. the output are human-labeled supervisory signals

Often, the annotation is done by crowdworkers (if the task is not already outsourced to another model)

Regression

Univariate regression problem (one output, real value)

Fully connected network

Graph Regression

Multivariate regression problem (>1 output, real value)

Graph neural network

Text classification

Binary classification problem (two discrete classes)

Transformer network

Music genre classification

Multiclass classification problem (discrete classes, >2 possible values)

Recurrent neural network (RNN)

Image classification

Multiclass classification problem (discrete classes, >2 possible classes)

Convolutional network

What is a supervised learning model?

An equation relating input (age) to output (height)

Search through family of possible equations to find one that fits training data well

Deep neural networks are just a very flexible family of equations

Fitting deep neural networks = “Deep Learning”

Image segmentation

Multivariate binary classification problem (many outputs, two discrete classes)

Convolutional encoder-decoder network

Depth estimation

Multivariate regression problem (many outputs, continuous)

Convolutional encoder-decoder network

Some terms to remember

  • Regression = continuous numbers as output

  • Classification = discrete classes as output

  • Two class (binary) and multiclass classification treated differently

  • Multilabel = zero or more of x discrete classes

  • Univariate = one output

  • Multivariate = more than one output

Translation

Transcription

Image generation from text

What do these examples have in common?

Very complex relationship between input and output

Sometimes we may have many possible valid answers (think of translation for example)

But outputs (and sometimes inputs) obey rules

Can we learn the “grammar” of the data from unlabeled examples?

Can use an enormous amount of data to do this (as we don’t need costly labels)

This has potential to make the supervised learning task earlier by having a lot of general knowledge of possible outputs (about grammatically correct sentences for example)

Unsupervised Learning

Learning about a dataset without labels

For example:

  • Clustering

  • Finding outliers

  • Generating new examples

  • Filling in missing data

In this course, we focus primarily on supervised approaches, but boundaries are sometimes a bit fuzzy as “self-supervision” shows

Self-supervised Learning

We can also create large amounts of “free” labeled data ourselves with two main approaches:

Generative self-supervised learning masks part of each data example and the task is to predict the masked part (this way we get a “label”)

For example, take a corpus of unlabeled images, remove a part of each image and try to fill in (“inpaint”) the missing part

Or we might take a large corpus of text (from the internet) and mask some words that we then try to predict

Or we might take cut off texts and try to predict the word that follows after the cut-off

Self-supervised Learning

Contrastive self-supervised learning uses pairs of examples that have a relationship and compares them to unrelated pairs.

With images, we could set up the task to decide if pairs of images are transformed versions of one another or if they are unconnected

Or, with text, we can determine if two sentences follow each other in the original document or not

We can also establish if two sentences are logically related

–> A lot of potential for creative approaches using and transforming found data

Landmarks in Deep Learning

  • 1958 Perceptron (Simple “neural” model)

  • 1986 Backpropagation (Practical deep neural networks)

  • 1989 Convolutional networks (Supervised learning)

  • 2012 AlexNet Image classification (Supervised learning)

  • 2014 Generative adversarial networks (Unsupervised learning)

  • 2014 Deep Q-Learning - Atari games (Reinforcement learning)

  • 2016 AlphaGo (Reinforcement learning)

  • 2017 Machine translation (Supervised learning)

  • 2019 Language models ((Un)supervised learning)

  • 2022 Dall-E2 Image synthesis from text prompts ((Un)supervised learning)

  • 2022 ChatGPT ((Un)supervised learning)

  • 2023 GPT4 Multimodal model ((Un)supervised learning)

Applications of deep learning in the social sciences

The Hugging Face Model Hub can give you an idea about the vast number of possible application areas

Also check out the Data Set Hub and Hugging Face Spaces

Spaces are often used for demos and to showcase interesting models and their applications

You can also rent dedicated hardware (billed by the minute, usually very cheap) to run spaces privately without queues

Examples from NLP

Endless research opportunities using “Text as Data”

Grimmer, J., Roberts, M. E., & Stewart, B. M. (2022). Text as data: A new framework for machine learning and the social sciences. Princeton University Press.

Text data can come from social media for example and be analysed for sentiment, emotions, arguments, stance, …

Synthetic Data

“We propose and explore the possibility that language models can be studied as effective proxies for specific human subpopulations in social science research.”

“Practical and research applications of artificial intelligence tools have sometimes been limited by problematic biases (such as racism or sexism), which are often treated as uniform properties of the models. We show that the “algorithmic bias” within one such tool—the GPT-3 language model—is instead both fine-grained and demographically correlated, meaning that proper conditioning will cause it to accurately emulate response distributions from a wide variety of human subgroups. We term this property algorithmic fidelity and explore its extent in GPT-3.”

“We create ‘silicon samples’ by conditioning the model on thousands of sociodemographic backstories from real human participants in multiple large surveys conducted in the United States. We then compare the silicon and human samples to demonstrate that the information contained in GPT-3 goes far beyond surface similarity. It is nuanced, multifaceted, and reflects the complex interplay between ideas, attitudes, and sociocultural context that characterize human attitudes.”

Many other examples

Synthetic data: in-silico replication of experiments

In the vision domain: image classification, for example of satellite images to count attendance at events (cars) or migration flows

Whisper: Analysis of transcripts of videos (for example from Youtube) with NLP models

In the (near) future, more tools for analysing videos directly?

Promising advances in video generation for example

Deep learning is not always what you need

“[…] using two-parameter logistic regression (that is, one neuron) and obtain the same performance as that of the 13,451-parameter DNN.”

“We further show that a logistic regression based on the measured distance and mainshock average slip (instead of derived stresses) performs better than the DNN.”

“Before commenting on the interesting philosophical issues raised by Mignan and Broccardo, I note that the authors were able to reproduce the results presented in our paper (available at https://github.com/phoebemrdevries/Learning-aftershock-location-patterns).”

“The perspective presented in our paper is that it was interesting to discover that a neural network learned a simple, non-exotic combination of stresses that provided considerably improved precision.”

Hype vs. Underclaiming

Hype

The AI industry, as many other parts of the economy, depends heavily on the attention of all kinds of stakeholders to attract funding

Fanning the flames with musings about the close possibility of Artifical General Intelligence (AGI) is part of that game

We also see special emphasis on extreme dangers that are only very very remotely likely (if at all possible)

This can also, maybe counterintuitively, be seen as beneficial

Polemically: “Fund us because only we can protect you”

Inflated expecations can also backfire, see “AI Winter

Underclaiming

On the other hand, it’s impossible to deny progress over the last years in areas such as NLP and also for other modalities than text such as image or video (analysis and generation)

Standardized benchmark test are indicators for the speed of progress (but still imperfect measures)

There is also a rewarding niche for experts that are by default downtalking every and each achievement

Often, these kind of experts have little to contribute besides that general criticism

There is a lot of questionable information around on “AI”

Benchmark Progress Example